Processing Serbian Written Texts: An Overview of Resources and Basic Tools

نویسندگان

  • Duško Vitas
  • Gordana Pavlović-Lažetić
  • Cvetana Krstev
  • Ljubomir Popović
  • Ivan Obradović
چکیده

In this paper we describe the resources and tools for the processing of texts written in Serbian that have been developed within the University of Belgrade NLP group located at the Faculty of Mathematics. The main features of these resources, namely available monolingual and multilingual corpora and various e-dictionaries are briefly described. The use of Intex, the main tool of the NLP group, for the recognition of unknown words, text tagging, building local grammars and disambiguation is outlined.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

An Overview of Resources and Basic Tools for the Processing of Serbian Written Texts

In this paper we describe the resources and tools for the processing of texts written in Serbian. Most of the resources have been developed within the University of Belgrade NLP group located at the Faculty of Mathematics. The main features of these resources, namely available monolingual and multilingual corpora and various e-dictionaries are briefly described. The use of Intex, the main tool ...

متن کامل

Automatic Recognition of Composite Verb Forms in Serbian

In this paper, we will present the work on building a shallow parser for recognizing composite verb forms in Serbian – the forms that consist of an auxiliary verb and a main verb. The parser is made in Unitex, a corpus processing software, in the form of local grammars that rely on using morphological dictionaries of Serbian. The model was tested on a small corpus of texts, both written in Serb...

متن کامل

Metadiscourse Markers Revisited in EFL Context: The Case of Iranian Academic Learners’ Perception of Written Texts

Moving in line with the postulation that metadiscourse (MD) markers help transform a dry and tortuous piece of text into a coherent and reader-friendly one, the researchers in the current study attempted to investigate the effect different metadiscourse markers might have on Iranian EFL learners’ perception of written texts. To this end, 120 undergraduate English students were given three diffe...

متن کامل

The Extent of Using the Basic Vocabulary in the First Grade Quran Textbook

The Extent of Using the Basic Vocabulary in the First Grade Quran Textbook S. B. Alavi Moghaddam, Ph.D. Textbooks need to be written in such a way that their readers can understand the written texts. One way of ensuring this objective in first grade textbooks would be the use of basic vocabulary, as determined by Ne'matzadeh, et.al. (1384). Considering the importance of the Quran text...

متن کامل

Resources for Processing Hebrew

We describe work in progress whose main objective is to create a collection of resources and tools for processing Hebrew. These resources include corpora of written texts, some of them annotated in various degrees of detail; tools for collecting, expanding and maintaining corpora; tools for annotation; lexicons, both monolingual and bilingual; a rule-based, linguistically motivated morphologica...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2006